Overview

Dataset Statistics

Number of Variables 25
Number of Rows 418612
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 247.9 MB
Average Row Size in Memory 621.0 B
Variable Types
  • GeoGraphy: 1
  • Categorical: 6
  • Numerical: 17
  • DateTime: 1

Dataset Insights

w and cyclist_points have similar distributions Similar Distribution
points is skewed Skewed
uci_points is skewed Skewed
delta is skewed Skewed
stage is skewed Skewed
total_stages is skewed Skewed
w is skewed Skewed
cyclist_points is skewed Skewed
month is skewed Skewed
cyclist_id has a high cardinality: 4121 distinct values High Cardinality
_url has a high cardinality: 2751 distinct values High Cardinality
cyclist_team has a high cardinality: 90 distinct values High Cardinality
uci_points has 177578 (42.42%) zeros Zeros
delta has 86577 (20.68%) zeros Zeros
w has 366376 (87.52%) zeros Zeros
cyclist_points has 366376 (87.52%) zeros Zeros
  • 1
  • 2

Variables


nationality

categorical

Approximate Distinct Count 31
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 30180792

Length

Mean 7.0973
Standard Deviation 2.4124
Median 6
Minimum 5
Maximum 14

Sample

1st row France
2nd row France
3rd row France
4th row France
5th row France

Letter

Count 2940572
Lowercase Letter 2491520
Space Separator 30440
Uppercase Letter 449052
Dash Punctuation 0
Decimal Number 0

cyclist_id

categorical

Approximate Distinct Count 4121
Approximate Unique (%) 1.0%
Missing 0
Missing (%) 0.0%
Memory Size 33371753

Length

Mean 14.72
Standard Deviation 3.2902
Median 14
Minimum 7
Maximum 36

Sample

1st row gerard-rue
2nd row gerard-rue
3rd row gerard-rue
4th row gerard-rue
5th row gerard-rue

Letter

Count 5667475
Lowercase Letter 5667475
Space Separator 0
Uppercase Letter 0
Dash Punctuation 491916
Decimal Number 2582
  • cyclist_id contains many words: 4121 words

_url

categorical

Approximate Distinct Count 2751
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 38723863

Length

Mean 27.5054
Standard Deviation 2.5321
Median 28
Minimum 21
Maximum 35

Sample

1st row tour-de-france/199...
2nd row tour-de-france/199...
3rd row tour-de-france/199...
4th row tour-de-france/199...
5th row tour-de-france/199...

Letter

Count 7333779
Lowercase Letter 7333779
Space Separator 0
Uppercase Letter 0
Dash Punctuation 1130811
Decimal Number 2212269
  • _url contains many words: 2751 words

points

numerical

Approximate Distinct Count 32
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 94.1215
Minimum 21
Maximum 430
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • points is skewed right (γ1 = 3.0629)

Quantile Statistics

Minimum 21
5-th Percentile 50
Q1 58
Median 82
Q3 100
95-th Percentile 235
Maximum 430
Range 409
IQR 42

Descriptive Statistics

Mean 94.1215
Standard Deviation 55.8118
Variance 3114.9598
Sum 3.94e+07
Skewness 3.0629
Kurtosis 10.2086
Coefficient of Variation 0.593
  • points is not normally distributed (p-value 1.3741308234565474e-14)
  • points has 27408 outliers

uci_points

numerical

Approximate Distinct Count 20
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 44.2924
Minimum 0
Maximum 800
Zeros 177578
Zeros (%) 42.4%
Negatives 0
Negatives (%) 0.0%
  • uci_points is skewed right (γ1 = 4.0568)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 6
Q3 60
95-th Percentile 120
Maximum 800
Range 800
IQR 60

Descriptive Statistics

Mean 44.2924
Standard Deviation 86.1926
Variance 7429.1655
Sum 1.8541e+07
Skewness 4.0568
Kurtosis 20.8144
Coefficient of Variation 1.946
  • uci_points is not normally distributed (p-value 1.982497122730901e-21)
  • uci_points has 18798 outliers

length

numerical

Approximate Distinct Count 937
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 170739.4199
Minimum 3600
Maximum 305000
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • length is skewed left (γ1 = -1.283)

Quantile Statistics

Minimum 3600
5-th Percentile 29600
Q1 157400
Median 179500
Q3 202000
95-th Percentile 244000
Maximum 305000
Range 301400
IQR 44600

Descriptive Statistics

Mean 170739.4199
Standard Deviation 55866.9663
Variance 3.1211e+09
Sum 7.1474e+10
Skewness -1.283
Kurtosis 1.8336
Coefficient of Variation 0.3272
  • length is not normally distributed (p-value 0.0018522882733240848)
  • length has 43709 outliers

startlist_quality

numerical

Approximate Distinct Count 403
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 1116.9391
Minimum 156
Maximum 2047
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • startlist_quality is skewed right (γ1 = 0.7613)

Quantile Statistics

Minimum 156
5-th Percentile 676
Q1 867
Median 997
Q3 1309
95-th Percentile 1812
Maximum 2047
Range 1891
IQR 442

Descriptive Statistics

Mean 1116.9391
Standard Deviation 361.785
Variance 130888.4045
Sum 4.6756e+08
Skewness 0.7613
Kurtosis -0.3061
Coefficient of Variation 0.3239
  • startlist_quality is not normally distributed (p-value 0.00037913988068132774)
  • startlist_quality has 7089 outliers

date

datetime

Distinct Count 98259.459
Approximate Unique (%) 23.5%
Missing 0
Missing (%) 0.0%
Memory Size 6697792
Minimum 1991-05-06 01:00:22
Maximum 2023-07-29 05:52:14

position

numerical

Approximate Distinct Count 12879
Approximate Unique (%) 3.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 0.4966
Minimum 0
Maximum 1
Zeros 2750
Zeros (%) 0.7%
Negatives 0
Negatives (%) 0.0%
  • position is skewed right (γ1 = 0.0004)

Quantile Statistics

Minimum 0
5-th Percentile 0.04706
Q1 0.25
Median 0.5
Q3 0.7475
95-th Percentile 0.9483
Maximum 1
Range 1
IQR 0.4975

Descriptive Statistics

Mean 0.4966
Standard Deviation 0.2887
Variance 0.08333
Sum 207893.8546
Skewness 0.00037643
Kurtosis -1.2001
Coefficient of Variation 0.5813
  • position is not normally distributed (p-value 1.9605521691235343e-05)

cyclist_age

numerical

Approximate Distinct Count 28
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 28.7901
Minimum 19
Maximum 56
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • cyclist_age is skewed right (γ1 = 0.398)

Quantile Statistics

Minimum 19
5-th Percentile 23
Q1 26
Median 28
Q3 32
95-th Percentile 36
Maximum 56
Range 37
IQR 6

Descriptive Statistics

Mean 28.7901
Standard Deviation 3.9522
Variance 15.6199
Sum 1.2052e+07
Skewness 0.398
Kurtosis -0.358
Coefficient of Variation 0.1373
  • cyclist_age is not normally distributed (p-value 0.0014216035842179715)
  • cyclist_age has 331 outliers

is_tarmac

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 28916486
  • The largest value (True) is over 11.98 times larger than the second largest value (False)

Length

Mean 4.0771
Standard Deviation 0.2667
Median 4
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 1706706
Lowercase Letter 1288094
Space Separator 0
Uppercase Letter 418612
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (True, False) take over 50.0%
  • The largest value (true) is over 11.98 times larger than the second largest value (false)

cyclist_team

categorical

Approximate Distinct Count 90
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 33954724
  • The largest value (other) is over 11.35 times larger than the second largest value (liberty-seguros-wurth-team-2005)

Length

Mean 16.1126
Standard Deviation 8.7232
Median 15
Minimum 5
Maximum 36

Sample

1st row denmark-1991
2nd row france-1978
3rd row norway-1987
4th row denmark-1991
5th row france-1978

Letter

Count 4669488
Lowercase Letter 4669488
Space Separator 0
Uppercase Letter 0
Dash Punctuation 692264
Decimal Number 1355848

delta

numerical

Approximate Distinct Count 2673
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 472.2875
Minimum 0
Maximum 46380
Zeros 86577
Zeros (%) 20.7%
Negatives 0
Negatives (%) 0.0%
  • delta is skewed right (γ1 = 28.4045)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 12
Median 205
Q3 735
95-th Percentile 1709
Maximum 46380
Range 46380
IQR 723

Descriptive Statistics

Mean 472.2875
Standard Deviation 943.8603
Variance 890872.2517
Sum 1.9771e+08
Skewness 28.4045
Kurtosis 1297.9893
Coefficient of Variation 1.9985
  • delta is not normally distributed (p-value 2.8172321796671243e-24)
  • delta has 16577 outliers

race_name

categorical

Approximate Distinct Count 27
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 33164834

Length

Mean 14.2257
Standard Deviation 2.5374
Median 14
Minimum 8
Maximum 22

Sample

1st row tour-de-france
2nd row tour-de-france
3rd row tour-de-france
4th row tour-de-france
5th row tour-de-france

Letter

Count 5208441
Lowercase Letter 5208441
Space Separator 0
Uppercase Letter 0
Dash Punctuation 744457
Decimal Number 2156

year

numerical

Approximate Distinct Count 33
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5023344
Mean 2010.8165
Minimum 1991
Maximum 2023
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • year is skewed left (γ1 = -0.4838)

Quantile Statistics

Minimum 1991
5-th Percentile 1995
Q1 2005
Median 2012
Q3 2017
95-th Percentile 2022
Maximum 2023
Range 32
IQR 12

Descriptive Statistics

Mean 2010.8165
Standard Deviation 8.2369
Variance 67.8463
Sum 8.4175e+08
Skewness -0.4838
Kurtosis -0.6489
Coefficient of Variation 0.004096
  • year is not normally distributed (p-value 3.207638763062643e-08)

stage

numerical

Approximate Distinct Count 97
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 0.5487
Minimum 0.04545
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • stage is skewed right (γ1 = 0.0304)

Quantile Statistics

Minimum 0.04545
5-th Percentile 0.09524
Q1 0.2857
Median 0.55
Q3 0.8333
95-th Percentile 1
Maximum 1
Range 0.9545
IQR 0.5476

Descriptive Statistics

Mean 0.5487
Standard Deviation 0.3046
Variance 0.09279
Sum 229700.5731
Skewness 0.03038
Kurtosis -1.2745
Coefficient of Variation 0.5551
  • stage is not normally distributed (p-value 1.0220325619197256e-15)

race_category

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 31506237
  • The largest value (grand_tour) is over 1.62 times larger than the second largest value (7_stages)

Length

Mean 10.2636
Standard Deviation 2.2851
Median 10
Minimum 5
Maximum 14

Sample

1st row tour_de_france
2nd row tour_de_france
3rd row tour_de_france
4th row tour_de_france
5th row tour_de_france

Letter

Count 3700597
Lowercase Letter 3700597
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 109414
  • The top 2 categories (grand_tour, 7_stages) take over 50.0%
  • The largest value (grand_tour) is over 1.62 times larger than the second largest value (7_stages)

total_stages

numerical

Approximate Distinct Count 14
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 15.3163
Minimum 1
Maximum 23
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • total_stages is skewed left (γ1 = -0.816)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 7
Median 20
Q3 21
95-th Percentile 21
Maximum 23
Range 22
IQR 14

Descriptive Statistics

Mean 15.3163
Standard Deviation 7.1751
Variance 51.4818
Sum 6.4116e+06
Skewness -0.816
Kurtosis -1.0579
Coefficient of Variation 0.4685
  • total_stages is not normally distributed (p-value 2.7593136675754194e-16)

total_racers

numerical

Approximate Distinct Count 162
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 159.3352
Minimum 20
Maximum 206
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • total_racers is skewed left (γ1 = -1.1746)

Quantile Statistics

Minimum 20
5-th Percentile 110
Q1 145
Median 164
Q3 177
95-th Percentile 195
Maximum 206
Range 186
IQR 32

Descriptive Statistics

Mean 159.3352
Standard Deviation 26.3405
Variance 693.8244
Sum 6.67e+07
Skewness -1.1746
Kurtosis 2.4094
Coefficient of Variation 0.1653
  • total_racers has 11769 outliers

w

numerical

Approximate Distinct Count 16
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 1.3531
Minimum 0
Maximum 50
Zeros 366376
Zeros (%) 87.5%
Negatives 0
Negatives (%) 0.0%
  • w is skewed right (γ1 = 5.9704)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 10
Maximum 50
Range 50
IQR 0

Descriptive Statistics

Mean 1.3531
Standard Deviation 5.5689
Variance 31.0125
Sum 566414
Skewness 5.9704
Kurtosis 41.8421
Coefficient of Variation 4.1157
  • w is not normally distributed (p-value 4.584041047477573e-25)
  • w has 52236 outliers

cyclist_points

numerical

Approximate Distinct Count 432
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 0.6513
Minimum 0
Maximum 104.3689
Zeros 366376
Zeros (%) 87.5%
Negatives 0
Negatives (%) 0.0%
  • cyclist_points is skewed right (γ1 = 10.9571)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 3.9417
Maximum 104.3689
Range 104.3689
IQR 0

Descriptive Statistics

Mean 0.6513
Standard Deviation 3.2313
Variance 10.4412
Sum 272650.4472
Skewness 10.9571
Kurtosis 188.2607
Coefficient of Variation 4.9611
  • cyclist_points is not normally distributed (p-value 4.440804803719333e-25)
  • cyclist_points has 52236 outliers

month

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 6.1263
Minimum 2
Maximum 11
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • month is skewed left (γ1 = -0.0483)

Quantile Statistics

Minimum 2
5-th Percentile 3
Q1 5
Median 6
Q3 8
95-th Percentile 9
Maximum 11
Range 9
IQR 3

Descriptive Statistics

Mean 6.1263
Standard Deviation 2.0859
Variance 4.3509
Sum 2.5646e+06
Skewness -0.04825
Kurtosis -1.0067
Coefficient of Variation 0.3405
  • month is not normally distributed (p-value 1.5504787028713835e-11)

day

numerical

Approximate Distinct Count 31
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 15.3322
Minimum 1
Maximum 31
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • day is skewed right (γ1 = 0.1623)

Quantile Statistics

Minimum 1
5-th Percentile 3
Q1 9
Median 14
Q3 22
95-th Percentile 29
Maximum 31
Range 30
IQR 13

Descriptive Statistics

Mean 15.3322
Standard Deviation 7.9697
Variance 63.5157
Sum 6.4182e+06
Skewness 0.1623
Kurtosis -1.0202
Coefficient of Variation 0.5198
  • day is not normally distributed (p-value 1.0446973548248522e-10)

BSA

numerical

Approximate Distinct Count 679
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 219.1744
Minimum 168.6886
Maximum 276.9502
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • BSA is skewed right (γ1 = 0.1554)

Quantile Statistics

Minimum 168.6886
5-th Percentile 196.1235
Q1 209.6554
Median 218.4266
Q3 228.1307
95-th Percentile 245.0531
Maximum 276.9502
Range 108.2616
IQR 18.4753

Descriptive Statistics

Mean 219.1744
Standard Deviation 14.6456
Variance 214.4928
Sum 9.1749e+07
Skewness 0.1554
Kurtosis 0.1178
Coefficient of Variation 0.06682
  • BSA is not normally distributed (p-value 0.0044775892936192024)
  • BSA has 4964 outliers

aug_profile

numerical

Approximate Distinct Count 2129
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6697792
Mean 1.1576
Minimum 0.2005
Maximum 2.8523
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • aug_profile is skewed right (γ1 = 0.3661)

Quantile Statistics

Minimum 0.2005
5-th Percentile 0.2956
Q1 0.668
Median 1.0483
Q3 1.6329
95-th Percentile 2.1987
Maximum 2.8523
Range 2.6518
IQR 0.9649

Descriptive Statistics

Mean 1.1576
Standard Deviation 0.5995
Variance 0.3594
Sum 484572.9328
Skewness 0.3661
Kurtosis -0.8305
Coefficient of Variation 0.5179
  • aug_profile is not normally distributed (p-value 0.00019132488014668637)

Interactions

Correlations

Missing Values